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Abstract 

The present paper reviews the graphical and nongraphical methods for estimating 
multivariate normality. Prior to exploring this methodology, a foundation will first be established 
by presenting ways to assess univariate and bivariate normality. A data set of three variables used 
by Stevens (1986) is analyzed using Q-Q plots, stem and leaf plots, histograms, skewness and 
kurtosis coefficients, the Shapiro-Wilk statistic, and bivariate and multivariate scatterplots. 
Multivariate normality is explored in terms of calculating Mahalanobis distances and plotting them 
on a scattergram against derived chi-square values using Fortran and SPSS programs developed 
by Thompson (1990, 1997). 
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Ways to Evaluate the Assumption of Multivariate Normality 
Multivariate analyses are vital to the social sciences in the exploration of a dynamic 
environment. Fish (1988) and Thompson (1994) stated that use of multivariate methods are vital 
for two reasons. First, multivariate methods avoid the inflation of experimentwise Type I error 
rates that occur when univariate methods are employed in a single study to test multiple 
hypotheses that are at least partially uncorrelated. Secondly, and more importantly, multivariate 
methods analytically honor a substantive reality in which most effects have multiple causes and 
multiple consequences. 

The trend toward utilization of multivariate methods has increased over the past two 
decades, as noted by Emmons, Stallings, and Layne (1990) and Grimm and Yarnold (1995). The 
former group of researchers studied 16 years of research reports in three journals and found that 
the multivariate characteristic of the social science research environment with its 
many confounding or intervening variables has been addressed through the trend 
toward increased use of multivariate analysis of variance and covariance, multiple 
regression, and multiple correlation, (p. 14) 

The latter group of researchers noted that, “In the last 20 years, the use of multivariate statistics 
has become commonplace. Indeed, it is difficult to find empirically based articles that do not use 
one or another multivariate analysis” (p. vii). 

Because these methods are gaining in popularity, it is important to understand the 
assumptions underlying multivariate statistical techniques, one of which is multivariate normality. 
It is imperative to remember that multivariate normality is basic to the statistical significance 
inference procedure of multivariate analysis (Marascuilo& Levin, 1983). The purpose of the 
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presen, paper is ,o review ,„e graphical and nongraphical methods for estimating multivariate 
normality. Prior to exploring this methodology, a foundation will first be established by 
presenting ways to assess univariate and bivariate normality. 

Normality 

Parametric tests require the estimation of a least one population parameter from the 

sample statistics. To make the est.mation, certain assumpt.ons must be made, the most important 

of wh,ch is that the variable measured in the sample is nornaUy distributed in the population to 

which i, is to be generalized (Munro ft Page, ,993,. „ is important to remember that the norma, 

curve is a mathematical model that depends upon the mean and the standard deviation, in the 

restrictive sense that the mean and the standard deviation are used to calculate skewness and 

kurtosts. Skewness and kurtosis quantitatively evaluate the normality of the distribution, with 

e n n a to the asymmetry of the curve and kurtosis referring to the tallness or flatness 
of the curve (Bump, 1991). 

The properties of the normal curve include the 

following; 



'' The curve is symmetrical. The mean, median, and mode 
coincide. 

2. The maximum ordinate of the curve occurs at the mean, that is, 

where z - 0 in a normal z score distribution, and the unit normal 
curve is equal to .3989. 

curve is asymptotic. It approaches but does not meet the 
horizontal axis and extends from minus infinity t0 plus infinity. 
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4. The points of inflection of the curve occur at points plus or 
minus one standard deviation unit above and below the mean. Thus 
the curve changes from convex to concave in relation to the 
horizonal axis at these points. 

5. Roughly 68% of the area of the curve falls within the limits plus 
or minus one standard deviation unit from the mean. 

6. In the unit normal curve the limits z = +/- 1.96 include 95% and 
the limits z = +/- 2.58 include 99% of the total area of the curve, 

5% and 1% of the area, respectively, falling beyond these limits. 

(Ferguson, 1976, p. 98) 

Univariate Normality 



Before proceeding to a discussion of multivariate normality, i, is important to review 

univariate and bivariate normality because “normality on each of the variable is a necessa^ but 

no, sufficient condition for multivariate normality to hold™ (Stevens. ,9%. p. 243). Analysis of 

variance (ANOVA) tests whether between group means differ and has as one of its assumptions 

that the dependent variable should be normally distributed. ANOVA is robust with respect to the 

normality assumption and skewness has very little effect (generally only a few hundredths) on 

level of significance or power if the design is “balanced” (i.e., equal number of observations per 

cell). Platykurtosis (flattened distribution relative to the normal distribution) attenuates power 
(Stevens, 1996). 

Univariate tests for assessing normality may be graphical and nongraphical. To graphically 
determine univariate normality, a Q-Q Pl„, (quanttle-versus-quantiie), compares observed values 
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with expected normal distribution values. In these plots, scores are ranked and sorted. An 
expected normal value is computed and compared with the actual normal values for each case. 

The expected normal value is the position a case with that rank holds in a normal distribution; the 
normal value is the position it holds in the actual distribution. If the actual distribution is normal, 
the points for the cases fall along the diagonal running from lower left to upper right, with some 
minor deviations secondary to random processes (Tabachnick & Fidell, 1989). 

Figure 1 graphically displays a variable with one hundred responses in increasing order of 
magnitude plotted against expected normal distribution values. Normality is tenable in this 
instance because the plot resembles a straight line. Figure 2 is an arrangement of 50 responses for 
a variable in increasing order of magnitude plotted against expected normal distribution values. 
Normality is not tenable in this instance because the plot does not resemble a straight line. Only 
two points are plotted when n = 50. In this instance, other pictorial representations assist in the 
determination of normality. 

Q-Q plots are available using the graphs menu on SPSS (Appendix A). SPSS also 
provides stem and leaf plots (e.g.. Figure 3) and histograms (e. g.. Figure 4) for visualization of 
normality. The normal curve, as presented in basic statistical texts, is more readily visualized in 
stem and leaf plots and histograms. Figures 3 and 4 demonstrate the classic bell curve using the 
one hundred responses denoted in Figure 1. Figures 5 and 6 fail to demonstrate normality using 
the 50 responses denoted in figure 2. It is important to remember that with small or moderate 
sample sizes, it may be difficult to tell whether graphic non-normality is real or apparent 
(Gnanadesikan, 1977; Neter, Kutner, Nachtsheim, & Wasserman, 1996; Norusis, 1995). 

The most powerful non-graphic tests for determining univariate normality includes the 




7 



Multivariate Normality 7 



skewness and kurtosis coefficients and the Shapiro-Wilk test (Stevens, 1996). In SPSS, this 
information can be obtained with the Explore procedure (Appendix A). Note that SPSS will print 
the Shapiro-Wilk for samples with less than 50 observations and the K-S Lilliefors statistic for 
samples with greater than 50 observations. Table 1 shows the SPSS Descriptives printout for 
data with 100 responses and Table 2 shows the SPSS Descriptives printout for data with 26 
responses. 

Fisher’s Measure of Skewness. This statistic is based on deviations from the mean to the 
third power. A symmetrical curve will result in a value of 0. If the skewness value is positive, 
then the curve is skewed to the right, and vice versa. Dividing the measure of skewness by the 
standard error for skewness results in a number that is interpreted in terms of the normal curve. 
Values above +1 .96 or below -1 .96 are statistically significant because 95% of the scores in the 
normal distribution fall between +1 .96 and -1 .96 standard deviations from the mean. Because this 
statistic is based on deviations to the third power, it is very sensitive to extreme values (Munro & 
Page, 1993). The coefficients in Tables 1 and 2 are not statistically significant. 

Fisher’s Measure of Kurtosis. This statistic indicates whether a distribution is too flat or 
too peaked, being based on deviations of the mean to the fourth power. If the kurtosis value is 
positive, the distribution is too peaked to be normal; if the kurtosis value is negative, the curve is 
too flat to be normal. The kurtosis statistic is divided by the standard error for kurtosis and the 
values compared to the +/- 1.96 range used to determine skewness (Munro & Page, 1993). The 
coefficients in Tables 1 and 2 are not statistically significant. 

Shapiro-Wilk Test . Shapiro and Wilk developed a test for normality that is sensitive to a 
wide variety of alternatives to the normal. Small values of W correspond to departure from 
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normality. If observed significance levels are reasonably large (greater than 0.1), normality is not 
an unreasonable assumption (Gnanadesikan, 1977). The Shapiro-Wilk statistic in Table 2 is 
sufficiently large so that the assumption of normality is tenable. 

Bivariate Normality 

The normal correlation model for the case of two variables is based on the bivariate 
normal distribution. Consider the vocabulary (X,) scores and math (X 2 ) scores for a group of 
students from Table 3. The student’s score combinations form a scatter diagram (Figure 7). The 
centroid, (X, = 17.6. X 2 = 16. 1 ), is the center of the 10 cases (Tatsuoka, 1971b). If there was a 
large population of students, a clustering of points would be expected around the centroid with a 
gradual thinning as the distance away from the centroid continues. To depict this in a manner 
analogous to the normal curve, a third dimension, frequency, is needed perpendicular to the (X,, 
X 2 ) plane. 

The surface will resemble a bell shaped “mound” similar to Figures 8, 9, 10, and 1 1, with 
the apex vertically above the centroid (Karson, 1982, Neter, Kutner, Nachtsheim, & Wasserman, 
1996, Tatasuoka, 1971a, 1971b). For every pair of values (X,, X 2 ), the density f (X,, X 2 ) 
represents the height of the surface at that very point. The surface is continuous, with probability 
corresponding to the volume under the surface (Neter, Kutner, Nachtsheim, & Wasserman, 

1996). Though this conveys a general impression, it is customary to represent the bivariate curve 
with a series of contour lines. These contour lines (Figure 12) are a series of concentric ellipses 
and their common center is the centroid. The statistical implication of the volume under the 
bivariate normal surface of a given elliptical region is parallel to the meaning of the area under the 
normal curve over a given interval. It represents the probability that a random bivariate 
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observation, when plotted as a point on the (X„ X 2 ) plane, will lie within the elliptical region. For 
example, in Figure 12, an observation that falls in the small ellipse has an 80% chance of being 
included in the sample because it is close to the mean, whereas an observation that falls in the 
large ellipse has a 20% chance of being included in the sample because it is far from the mean 
(Morrison, 1983). The contour is a cross section of the surface made by a plane parallel to the 
(X„ X,) plane. Thinking must still be three dimensional because the bell shaped “mound” is 
being sliced into sections, with the top part of the “mound” being the top of the normal curve and 
the bottom part of the “mound ’ being the bottom of the normal curve. Thus, bivariate normality 
is checked by graphing X, and X 2 and noting the scatter of the variables around the centroid. The 
pattern should be elliptical (Karson, 1982, Neter, Kutner, Nachtsheim, & Wasserman, 1996, 
Tatasuoka, 1971a, 1971b). 

Multivariate Normality 

Multivariate normality is assessed to verify the reasonableness of assuming normality for a 
given body of multiresponse Questions. As can be imagined, there are many possibilities for 
departure from normality with multiresponse data. A preliminary step in evaluating the normality 
of multiresponse data is to evaluate univariate normality for each of the variables. In the printout 
of the MULTINOR Program written by Thompson (1990) (Appendix B), univariate normality for 
each of the three variables was checked using Q-Q Plots, stem and leaf plots, histograms, the 
Shapiro-Wilk’s statistic, and skewness and kurtosis coefficients (Figures 13 through 21; Tables 4 
and 5). The Q-Q plots of the three variables (Figures 13, 14, and 15) show that normality is 
tenable for vaiiable one because the plot resembles a straight line but normality is not as tenable 
for variables two and three because the plots do not resemble a straight line. The stem and leaf 
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plot and histogram of variable one (Figures 16 and Figure 19) reveal a somewhat normal 
distribution while the stem and leaf plots and histograms of variables two (Figures 17 and 20) and 
three (Figures 18 and 21) reveal negatively skewed and trimodal distributions respectively. The 
descriptives data (Tables 4 and 5) reveal skewness and kurtosis statistics that are not statistically 
significant for all three variables and Shapiro-Wilk statistics that are significantly large for 
variables one and three to make the normality assumption not unreasonable. Univariate normality 
cannot be assumed for these variables. Remember that univariate normality was discussed 
because “normality on each of the variables separately is a necessary, but not sufficient , condition 
for multivariate normality to hold” (Stevens, 1996, p. 243). 

Next, for normality to hold, any linear combinations of the variables must be normally 
distributed and all subsets of the set of variables must have multivariate normal distributions. This 
condition implies that all pairs of variables must be bivariate normal (Stevens, 1996). Bivariate 
normality was checked for in the MULTINOR data (Appendix B) by requesting scatterplots and 
noting elliptical patterns for the three possible combinations of the variables (Figures 22 through 
24). A cursory view of the patterns around the centroids does not reveal a clear elliptical pattern. 
Measuring and connecting the variables to form elliptical patterns based on percentages (80%, 
60%, 40%, and 20%) of variables around the centroid assists in visualizing the ellipses. 

The data can finally be checked for multivariate normality by calculating the Mahalanobis 
distance (D 2 ) for each subject (Thompson, 1990). The Mahalanobis distance is the distance of a 
case from the centroid of the remaining cases where the centroid is the point defined by the means 
of all the variables (Tabachnick & Fidell, 1989). Basically, it indicates how far a case is from the 
centroid of all cases for the predictor variables. A large distance indicates an observation that is 
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an outlier for the predictors. The Mahalanobis distance is the accepted measure of distance 
between two (quantitative) multivariate populations and is independent of sample size 
(Krzanowski, 1988; Stevens, 1996). 



In the MULTINOR printout, (Appendix B) the D 2 can be calculated for each subject using 
the formula D 2 ; = (x< - x)’ S' 1 (x< - x) where X; is the vector of data for case i and x is the vector of 
means (centroid) for the predictors. Using the data for subject eight from the MULTINOR 
printout, the equation for subject eight would be as follows (numbers are rounded to the nearest 
tenth); 



D 2 g = (.3, -0.9, 0.5) /0.57 -0.12 -0.3 



-0.12 0.33 -0.26 1-0.9 



I X 3 



v-0.37 -0.26 0.92/ \0. 5 



3X3 
jr n 




0.69408 



± 



Based on the formula, the matrices are 1 x 3, 3 x 3, and 3x1. To determine the numbers for the 



equation, first subtract the mean, of each variable from the scores of the selected subject to form 
the 1 x 3 and 3 x 1 matrices and use the inverted variance/covariance matrix from the printout for 



S' 1 . The results will match the Mahalanobis distances given on the second page of the 
MULTINOR printout. After the distances are calculated, the values are sorted in ascending order 
and paired with a derived chi-square value [(j - 0.5)/n = percentile for the chi-square], A table or 
computer program is required to determine p values because each chi square is not at the standard 
0.01 or 0.05 levels (see the second page of the MULTINOR printout). The pairs are then plotted 
in a scattergram (see the third page of the MULTINOR printout). If n (number of subjects in the 
sample) - p (number of variables) is greater than 25, the plot should resemble a straight line. 
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Conceptually, it is important to remember that the inverted variance/covariance matrix serves as a 
constant in the equation. Just by looking at the 1 x 3 and 3 x 1 matrices and their relation to the 
centroid, deciding where a subject will fall on a graph is possible. Order inferred distance can be 
estimated without the inverted variance/covariance matrix. 

Looking at the MULTINOR scatterplot (Appendix B), each subject can be identified. 
Subject 8 is the first * in the lower left hand corner because the D 2 /chi square value is the closest 
to the centroid; subject 17 is the * in the far upper right hand corner because the D 2 /chi square 
value is fartherest from the centroid (0/0). Again, distance indicates how far the case is from the 
centroid and if the plot resembles a straight line, normality is more tenable. The Mahalanobis 
distance represents the coordinate for the three means. In a multivariate normal curve, the cases 
will cluster around the centroid and taper off as the distance increases. 

Thompson (1997) wrote an SPSS program to test multivariate normality graphically 
(Appendix C). Note the commands on the first page of the program. Page two of the program 
lists all of the variables for the data set and their means. On page three of the program, the 
Mahalanobis statistics are listed with the residual statistics. Page four details the Mahalanobis 
Distances for each subject in ascending order (subject number six is first; subject number three is 
last). The distances are paired with Chi Square values and graphed (page six). 

Homogeneity of Variance-Covariance Matrices 

An indirect way to assess multivariate normality is to test the assumption that the 
variance-covariance matrices within each cell of the design are sampled from the same population 
variance-covariance matrix. If the matrices are sampled from the same population, they can 
reasonably be pooled to create a single estimate of error. Evaluation of homogeneity of variance- 
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covariance matrices in especially important when sample sizes are not equal. 

SPSS MANOVA conducts a Box’s M test to determine homogeneity of the variance- 
covariance matrices. The null hypothesis for the Box’s M test is that the variance-covariance 
matrices are not statistically significant, therefore a p value of greater that 0.05 is desired. If the 
assumption for multivariate generalization of homogeneity of variance is met, then it is likely that 
the assumption for multivariate normality is also met. This paper will not discuss in depth the 
relationship between normality and homogeneity and refers the reader to Tabachnick and Fidell 
(1989) for further exploration. 

Conclusion 

Although multivariate normality is not required to estimate most multivariate parameters 
(e.g., function coefficients, structure coefficients), even in these cases the distributions of the 
variables must be reasonably comparable. To test for multivariate normality, univariate and 
bivariate assumptions should be met in addition to calculating Mahalanobis distances and plotting 
them against a derived chi-square value to note their linearity. If the assumption for multivariate 
normality is met solely through calculation of Mahalanobis distances and graphically noting 
linearity, then the assumptions for univariate and bivariate normality are met. However, if data 
are determined to be univariate and bivariate normal, it may not be assumed to be multivariate 
normal. Computer programs are available to ease calculations to determine normality, including 
Thompson’s Multinor (1990, 1997) program. 
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Appendix A 



SPSS Commands 



PPLOT 

/VARIABLES=one 

/NOLOG 

/NOSTANDARDIZE 

/TYPE=Q-Q 

/FRACTION=BLOM 

/TIES=MEAN 

/DIST=NORMAL. 

GRAPH 

/HISTOGRAM=one. 

EXAMINE 

VARIABLES=one two three 

/PLOT BOX PLOT STEMLEAF HISTOGRAM NPPLOT 

/COMPARE GROUP 

/STATISTICS DESCRIPTIVES 

/C INTERVAL 95 

/MISSING LISTWISE 

/NOTOTAL. 



GRAPH 

/SCATTERPLOT(BIVAR)=one WITH three 
/MISSING=LISTWISE. 

PLOT 

/VERTICAL=‘VARIABLE ONE' REFERENCE (6.4) 
/HORIZONTAL=‘VARIABLE THREE’ REFERENCE (6.7) 
/PLOT=ONE WITH THREE. 

GRAPH 

/SCATTERPLOT (Bl VAR)=one WITH two 
/MISSING=LISTWISE. 

PLOT 

/VERTICAL=‘VARIABLE ONE’ REFERENCE (.6.4) 
/HORIZONTAL=‘ VARIABLE TWO’ REFERENCE (6.9) 
/PLOT=ONE WITH TWO. 

GRAPH 

/SCATTERPLOT (Bl VAR)=two WITH three 
/MISSING=LISTWISE. 

PLOT 

A/ERTICAL=‘ VARIABLE TWO’ REFERENCE (6.9) 
/HORIZONTAL=‘ VARIABLE THREE’ REFERENCE (6.7) 
/PLOT=TWO WITH THREE. 
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Appendix C 



multino2 .aer 10/11/97 



multinor . sps 

SET BLANKS=SYSMIS UNDEFINED=WARN printback=lo.Bt . 

TITLE 9 MULTINOR .SPS teats multivar normality graphically**** ' • 

COMMENT ******************************************************** 
COMMENT The original MULTINOR computer program was presented, 

COMMENT with examples, in: _ ^ 

COMMENT Thompson, B. (1990). MULTINOR: A FORTRAN program that 

COMMENT assists in evaluating multivariate normality. 

COMMENT Educational and Psychological Measurement^, 50, 

COMMENT 845-848. 

COMMENT _ _ 

COMMENT The logic and the data source for the example are from: 
Stevens, J. (1986). ^Applied multivariate statistics 
for the social sciences. Hillsdale, NJ : Erlbaum. 

(pp. 207-212) 



COMMENT 
COMMENT 
COMMENT 

COMMENT . 

COMMENT Here there are 3 variables for which multivariate 
COMMENT normality is being confirmed. 

DATA LIST 



************************************************************ 



FILE= 9 c : \ Bpsswin\mult inor . dat 9 FIXED RECORDS— 1 TABLE 
/I xl 1-3 (1) x2 5-7 (1) x3 9-11 (1). 

list variables=all/ caBeB= 99 99/ format =numbered . 

COMMENT 'y' is a variable automatically created by the program, and 
COMMENT doeB not have to modified for different data setB. 



compute y=$casenum . 
print formats y(F5) . 
regression variables=y xl to x3/ 
descriptive=mean Btddev corrr/ 
dependent=y/enter xl to x3/ 
save=mahal (mahal) . 
sort cases by mahal (a) . 

execute . 

list variableB=xl to x3 raahal/cases=9999/ f ormat=numbered . 

COMMENT In the next TWO lines, for a given data set put the actual 
COMMENT in place of the number '12' used for the example data set. 
loop #i=l to 12 . 

COMMENT In the next line, change '3' to whatever is the number 
COMMENT of variables. 

COMMENT The p critical value of chi square for a given case 

COMMENT iB set as [the case number (after sorting) - -5] / the 

COMMENT sample size]. 

compute p=($casenum — .5) / 12. • 

compute chisq=idf . chisq(p, 3 ) • 

end loop . 

print formats p chisq (F8.5) . 

list variables=y p mahal chisq/cases=9999/f ormat=numbered . 
plot 

vertical=' chi square'/ 
horizontal 'Mahalabis distance'/ 
plot=chisq with mahal . 



multinor.dat 

2.4 2.1 2.4 

3.5 1.8 3.9 

6.7 3.6 5.9 

5.3 3.3 6.1 

5.2 4.1 6.4 

3.2 2.7 4.0 

4.5 4.9 5.7 
3.9 4.7 4.7 
4.0 3.6 2.9 

5.7 5.5 6.2 

2.4 2.9 3.2 

2.7 2.6 4.1 
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multinor .1st 

-> SET BLANKS=SYSMIS UNDE FI NED = WARN printback=list . 

-> TITLE 'MULTINOR. SPS tests multivar normality graphically****'. 



COMMENT 

COMMENT 

COMMENT 

COMMENT 

COMMENT 

COMMENT 

COMMENT 

COMMENT 

COMMENT 

COMMENT 

COMMENT 

COMMENT 

COMMENT 



The original MULTINOR computer program was presented, 
with examples, in: 

Thompson, B. (1990). MULTINOR: A FORTRAN program that 
assists in evaluating multivariate normality. 

Educational and Psychological Measurement , 50, 

845-848. 

The logic and the data source for the example are from: 
Stevens, J. (1986). ^Applied multivariate statistics 
for the social sciences. Hillsdale, N J : Erlbaura. 

(pp. 207-212) ******* 



-> COMMENT Here there are 3 variables for which multivariate 
-> COMMENT normality is being confirmed. 

— > DATA LIST 

-> FILE= ' c : \spsswin\multinor . dat ' FIXED RECORDS=l TABLE 

-> /l xl 1-3 (1) x2 5-7 (1) x3 9-11 (1). 

-> list variables=all/cases=9999/format=numbered . 





Xl 


X2 


X3 


1 


2.4 


2.1 


2.4 


2 


3.5 


1.8 


3.9 


3 


6.7 


3.6 


5.9 


4 


5.3 


3.3 


6.1 


5 


5.2 


4.1 


6.4 


6 


3.2 


2.7 


4.0 


7 


4.5 


4.9 


5.7 


8 


3.9 


4.7 


4.7 


9 


4.0 


3.6 


2.9 


10 


5.7 


5.5 


6.2 


11 


2.4 


2.9 


3.2 


12 


2.7 


2.6 


4.1 




Number of cases read: 12 Number of cases listed: 12 

-> COMMENT 'y' is a variable automatically created by the program, and 
— > COMMENT does not have to modified for different data sets. 

-> compute y=$casenum . 

-> print formats y(F5) . 

-> regression variables=y xl to x3/ 

-> descriptive=mean stddev. corr/ 

-> dependent=y/enter xl to x3/ 

-> save=mahal (mahal) . 

LE REGRESSION * * * * 





* * * 


* M U L T I P ] 


Listwise 


Deletion 


of Missing Data 




Mean 


Std Dev Label 


Y 




3.606 


Xl 


/4Tl25^ 


1.384 


X2 / 


f 3.483 1 


1.147 


X3 \ 


i 4.625 / 


1.406 



o 

ERIC 



2 
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N of Cases = 



12 



Correlation: 

Y 

XI 

X2 

X3 



Y 

1.000 

-.207 

.376 

-.044 



XI 


X2 


X3 


-.207 


.376 


-.044 


1.000 


.606 


.845 


.606 


1.000 


.656 


.845 


.656 


1.000 



★ * * * 



MULTIPLE REGRESSION 



* * * * 



Equation Number 1 Dependent Variable.. Y 
Descriptive Statistics are printed on Page 

Block Number 1. Method: Enter 

Variable ( s ) Entered on Step Number 

1. . X3 

2. . X2 

3. . XI 



83 



Multiple R 
R Square 

Adjusted R Square 
Standard Error 

Analysis of Variance 



.66417 
. 44112 • 
.23154 
3.16069 




X3 



DF 



Sum of Squares 



Regression 


3 


63 


.08053 


Residual 




8 


79 


.91947 


F = 


2.10480 


Signif F = 


.1780 






Variables in the 


Equation — 


Variable 




B 


SE B 


Beta 


XI 


-1.909097 


1.296480 


-.733029 


X2 


2.445453 


1.110369 


.778083 


X3 


.165296 


1.345478 


.064454 


(Constant ) 


5.092203 


3.454771 




End Block 


Number 


1 All 


requested 


variables 




★ * * * 


M U L 


T I P L E 


R E G R 


Equation Number 1 


Dependent Variable.. Y 


Residuals 


Statistics: 








Min 


Max 


Mean 


Std Dev 


★PRED 


2.0801 


9.9172 


6.5000 


2.3947 


★ZPRED 


-1.8457 


1.4270 


.0000 


1.0000 


★SEPRED 


1.2118 


2.4798 


1.7932 


.3534 


★ADJPRED 


. 6074 


10.6661 


6.2406 


2.9511 


★RESID 


-5.0425 


5.0265 


.0000 


2.6954 


★ZRESID 


-1.5954 


1.5903 


.0000 


.8528 


★SRESID 


-1.9334 


1.8781 


.0291 


1.0420 


★DRESID 


-7.4057 


7.0104 


.2594 


4.0901 


★SDRESID 


-2.4778 


2.3496 


.0287 


1.2152 


★MAHAL 


.7004 


5.8543 


2.7500 


1.5070 


★COOK D 


.0000 


.4543 


.1364 


.1713 


★LEVER 


.0637 


.5322 


.2500 


.1370 


Total Cases = 


12 







Mean Square 
21.02684 
9.98993 



T Sig T 

-1.473 .1791 

2.202 .0588 

.123 .9053 

1.474 .1787 



S S I 0 N 



N 

12 

12 

12 

12 

12 

12 

12 

12 

12 

12 

12 

12 




3 
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********* 



******************** 

From Equation Is 1 new variables have been created 

Name Contents 

MAHAL Mahalanobis' Distance 

-> Bort cases by mahal(a) • 

-> execute . 



-> list variables=xl to x3 maha!4cases=9999/f ormat=numbered 




Number of cases read: 



ter of cases listed: 12 



-> COMMENT In the next TWO lines, for a given data set put 
-> COMMENT in place of the number '12' used for the example data set. 

-> loop #i=l to 12 . 

-> COMMENT In the next line, change '3' to whatever is the number 

-> COMMENT of variables. _ iv(an 

-> COMMENT The p critical value of chi square for a g 

-> COMMENT is set as [the case number (after sorting) -5) / the 
-> COMMENT sample size]. 

— > compute p= ( $casenum — .5) / 12. ^ 

— > compute chisq=idf . chiBq(p, 3 ) • 

-> end loop . 

-> print formatB p chisq (F8.5) 



A 






list 


variableB=y p mahc 
Y P i / 


jj^ehisq/cases=99? 
MAHAL CHISQ 


1 


6 


.04167 


.70038 


.30897 


2 


11 


.12500 


1.65042 


.69236 


3 


5 


.20833 


1.98854 


1.03962 


4 


8 


.29167 


2.17303 


1.38807 


5 


12 


.37500 


2.19634 


1.75398 


6 


7 


.45833 


2.22174 


2.15099 


7 


4 


.54167 


2.37118 


2.59519 


8 


2 


.62500 


2.53196 


3.10983 


9 


1 


.70833 


2.59346 


3.73392 


10 


10 


.79167 


3.12622 


4.54475 


11 


9 


.87500 


5.59246 


5.73941 


12 


3 


.95833 


5.85428 


8.22056 



Number of cases read: -12 Number of cases listed: 12 
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-> plot 

-> vertical 3 ' chi square / 

-> horizontal 3 'Mahalabis distance / 

-> plot=chisq with mahal . 

Hi-Res Chart # 6: Plot of chisq with mahal 




5 



35 



chi square 



Multivariate Normality 29 



Plot of CHISQ with MAHAL 




Mahalabis distance 



Table 1 
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SPSS Descriptives Printout for a Variable with 100 Responses Demonstrating Normality 



X 

Valid cases: 100.0 Missing cases: -0 Percent missing. 



Mean .0000 Std Err .1005 
Median .0000 Variance 1.0099 
5% Trim .0000 Std Dev 1.0049 
95% Cl for Mean (-- 1994 , .1994) 



Statistic 

K-S (Lilliefors) .0253 



Min 

Max 

Range 

IQR 


-2.6000 

2.6000 

5.2000 

1.4000 


Skewness 
S E Skew 
Kurtosis 
S E Kurt 


.0000 
.2414 
- . 0900 
.4783 


df 


Significance 




100 


> . 2000 
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Table 2 
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SPSS Descriptives Printout for a Variable with 26 Responses Failing to Demonstrate Normality 



ONE 



Valid cases: 


26. 


0 Missing cases: 


.0 Percent missing: 


.0 


Mean 6.4038 


Std 


Err .4171 


Mi n 


2.9000 Skewness 


.9959 


Median 6.0500 


Variance 4.5228 


Max 


12.5000 S E Skew 


.4556 


5% Trim 6.2791 


Std 


Dev 2.1267 


Range 


9.6000 Kurtosis 


1.6858 


95% Cl for Mean (5. 


,5449 


, 7.2628) 


IQR 


2.8250 S E Kurt 


.8865 






Statistic 


df 


Significance 




Shapi ro-Wilks 




.9424 


26 


.2169 




K-S (Lilliefors) 




.1151 


26 


> .2000 
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Table 3 

Vocabulary and Math Scores from 10 students 



Pupil Number 


Vocabulary Test (XJ 


Math Test (X- 


1 


19 


15 


2 


20 


18 


3 


17 


18 


4 


16 


12 


5 


19 


16 


6 


17 


16 


7 


18 


13 


8 


17 


20 


9 


15 


17 


10 


18 


16 


Mean 


17.6 


16.1 




„ best copy available 



Table 4 
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SPSS Descriptives Printout forVariables One. Two, and Three of Multinor data 









Statistic 


Std. Error 


ONE 


Mean 




6.4033 


.4171 




95% Confidence 


Lower Bound 


53449 






Interval for Moan 


Upper Bound 


7.2628 






5% Trimmod Moan 




6.2791 






Median 




6.0500 






Variance 




4.523 






Std. Deviation 




2.1267 






Minimum 




2.90 






Maximum 




1230 






Range 




9.60 






Interquartile Range 




23250 






Skewness 




396 


.456 




Kurtosis 




1.686 


.887 


TWO 


Moan 




6.8692 


.5339 




95% Confidence 


Lower Bound 


5.7695 






Interval for Mean 


Upper Bound 


7.9689 






5% Trimmed Mean 




6.8474 






Median 




7.1000 






Variance 




7.413 






Std. Deviation 




2.7226 






Minimum 




3.00 






Maximum 




11.20 






Range 




8.20 






Interquartile Range 




5.6750 






Skewness 




.069 


.456 




Kurtosis 




-1380 


.887 


THREE 


Mean 




6.7154 


.3568 




95% Confidence 


Lower Bound 


53805 






Interval for Moan 


Upper Bound 


7.4502 






5% Trimmed Mean 




6.6440 






Median 




63500 






Variance 




3310 






Std. Deviation 




1.8194 






Minimum 




430 






Maximum 




11.00 






Range 




6.80 






Interquartile Range 




23750 






Skewness 




.344 


.458 




Kurtosis 




-.506 


.887 
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Tests of Normality for Variables One. Two, and Three 





Kolmogorov-Smimov“ 


Shapiro-Wilk 




Statistic 


df 


Sig. 


Statistic 


df 


Sig. 


ONE 


.115 


26 


.200* 


.942 


26 


.217 


TWO 


.122 


26 


.200* 


.925 


26 


.069 


THREE 


.094 


26 


.200* 


.950 


26 


.310 



This is a lowtf bound of ths bus signifieancs. 
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Figure Captions 

Fig u re , 1 . Q-Q plot of 100 responses to a variable demonstrating normality. 

Fi gure 2. Q-Q plots of 50 responses to a variable failing to demonstrate normality. 

Figure 3. Stem and leaf plot of 100 responses to a variable demonstrating normality. 

F igure 4. Histogram of 100 responses to a variable demonstrating normality. 

Figure 5. Stem and leaf plots of 50 responses to a variable failing to demonstrate normality. 
Figure 6. Histograms of 50 responses to a variable failing to demonstrate normality. 

Fi gure 7, Scattergram of vocabulary and math scores. 

Note. From Selected Topics in Advanced Statistics: An Elementary Approach ( p . 1 5), by M. 
Tatsuoka, 1971, Champaign, Illinois: The Institute for Personality and Ability Testing. Copyright 
1971 by the Institute for Personality and Ability Testing. 

Figure 8. Graphical representation of a bivariate normal distribution (1) 

N ote. From Selected Topic s in Advanced Statistics: An Elementary Approach (p 16) by M 
Tatsuoka, 1971, Champaign, Illinois: The Institute for Personality and Ability Testing. Copyright 
1971 by the Institute for Personality and Ability Testing. 

Figu re 9, Graphical representation of a bivariate normal distribution (2) 

Note. From Mult ivariate Analysis: Techniciues for Educational Psychological Research (p. 64), 

by M. Tatsuoka, 1971, New York: John Wiley & Sons. Copyright 1971 by John Wiley & Sons 
Inc. 

Fi gure 1 0, Graphical representation of a bivariate normal distribution (3) 

Note. From Multivariate Statistical Methods: An Introduction (p. 52), by M. Karson, 1982, 
Ames, Iowa. The Iowa State University Press. Copyright 1982 by The Iowa State University 
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Press. 

Figure 1 1 , Graphical representation of a bivariate normal distribution (4) 

Note. From Applied Linear Statistical Models (p. 633), by J. Neter, M. Kutner, C. Nachtsheim, 
and W. Wasserman, Chicago: Irwin. Copyright 1996 by Times Mirror Higher Education Group, 
Inc. 

Figure 12, Contour diagram for a bivariate normal surface 

Note. From Applied Linear Statistical Methods (p. 26), by D. Morrison, 1983, Englewood Cliffs, 
New Jersey: Prentice-Hall, Inc. Copyright 1983 by Prentice-Hall, Inc. 

Figure 13. Q-Q plot of variable one of Multinor data 
Figure 14. Q-Q plot of variable two of Multinor data 
Figure 15. Q-Q plot of variable three of Multinor data 
Figure 16. Stem and leaf plot of variable one of Mulitinor data 
Figure 1 7, Stem and leaf plot of variable two of Mulitinor data 
Figure 18. Stem and leaf plot of variable three of Mulitinor data 
Figure 19, Histogram of variable one of Multinor data 
Figure 20. Histogram of variable two of Multinor data 
Figure 21. Histogram of variable three of Multinor data 
Figure 22. Scattergram of variables one and three of Multinor data 
Figure 23, Scattergram of variables one and two of Multinor data 
Figure 24. Scattergram of variables two and three of Multinor data 



Normal Q-Q Plot of X 




Observed Value 
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Expected Normal Value 



Normal Q-Q Plot of VAR00001 




Observed Value 



Normal Q-Q Plot of VAR00002 




Observed Value 



Leaf 



Frequency 


Stem 


1.00 


-2 
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-2 
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-1 
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-1 
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-0 
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-0 
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0 
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Each leaf: 



1 case(s) 
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VAR 00001 Stem-and-Leaf plot 



Frequency 


Stem & Leaf 


25.00 
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VAR00002 Stem-and-Leaf plot 



Frequency 
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Histogram 
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Normal Q-Q Plot of ONE 




Observed Value 



Normal Q-Q Plot of TWO 



CO 

E 



o 



73 

CD 

4 — > 

O 

<v 

a. 

x 

UJ 



2.0 

1.5 

1.0 

.5 

0.0 

-.5 



- 1.0 



-1.5 

- 2.0 




Observed Value 



t- . 

o ( 



o 

ERIC 

hfflimffaHaoaa 



Expected Normal 



Normal Q-Q Plot of THREE 



2 . 0 ' 



1 . 5 ' 

1 . 0 ' 

. 5 . 

o.o i 

-. 5 ' 

- 1 . 0 ' 

- 1 . 5 ' 

- 2 . 0 . 






~r 

4 



6 



Observed Value 






' o 



—w~ 

8 



10 



12 




58 



Frequency 



Stem 



1.00 2 

1.00 3 

5.00 4 

5.00 5 

5.00 6 

4.00 7 

3.00 8 

.00 9 

1.00 10 

1.00 Extremes 

Stem width: 1 

Each leaf: 1 




& Leaf 
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3.00 
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3.00 
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7.00 
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Variable One 




Stcl. Dev = 2.13 
Mean = 6.4 
N = 26.00 



ONE 




b-c 



Variable Two 




Std. Dev = 2.72 
Mean =6.9 
N = 26.00 



TWO 
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Variable Three 




Std. Dev = 1.82 
Mean = 6.72 
N = 26.00 



THREE 
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Variable Two 
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Variable Three 
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Variable Three 
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